## [1] "50.95 % of sequences were merged"
## [1] "11.21 % of sequences were dropped"
## [1] "8.104 % of sequences were ribosomes and removed"
## [1] "Three samples had a high % of ribosomes removed"
## V1
## 1 HI.4752.003.NEBNext_Index_13.000576_ChampSt1-20160915-WatPhotz_RNAa
## 2 HI.4752.003.NEBNext_Index_14.000577_ChampSt1-20160915-WatPhotz_RNAb
## 3 HI.4752.003.NEBNext_Index_15.000578_ChampSt1-20160915-WatPhotz_RNAc
## V9
## 1 59.60
## 2 63.94
## 3 66.96
## [1] "431147: mininum nb sequence annotated per sample (refseq)"
## [1] "31847464: maximum nb sequence annotated per sample (refseq)"
## [1] "2049000: mean nb sequence annotated per sample (refseq)"
## [1] "108955: mininum nb sequence annotated per sample (subsys)"
## [1] "16733558: maximum nb sequence annotated per sample (subsys)"
## [1] "1001000: mean nb sequence annotated per sample (subsys)"
## [1] "0.7181: Total Number of paired-end reads (billions)"
## [1] "0.15: Total Number of sequences after cleaning+merging (billions)"
## [1] "93.37% : percentage of sequences annotated (refseq)"
## [1] "45.62% : percentage of sequences annotated (subsys)"
samtools faidx Inediibacterium_massiliense_genome.fasta
samtools faidx Inediibacterium_massiliense_genome.fasta NZ_LN876587.1:1637150-1637400 >I_massiliense_contig1.fasta
samtools faidx Inediibacterium_massiliense_genome.fasta
samtools faidx Blautia_massiliensis_GD8_NZ_LN913006_WGS.fasta NZ_LN913006.1:2359000-2360200 >B_massiliensis_contig1.fasta
Check the Microcystis in the database and in the list of annotated genes?
There are 67425 Microcystis gene products (63.5k are for M. aeruginosa) in the dataset
This is a very good representation
Get a Microcystis protein set from Olga?
Done, but not sure how usefull, given that it is actually well represented in the dataset
Check illumina primer/adaptor sequences (find them on nanuq), if they may cause contamination (esp. for the 09-15 samples)?
Done. Refiltered samples as there was indeed some contamination
Didn’t change results very much
Check sequence similarity bwtn the 2 fecal species proteins
done. There are almost the same…
Theoretically, you should get a could corellation bwtn annotation to Rubisco + Photosystem and chlorophyll content. I should verify this (simple correlation or can also perfom a more complex RDA)
Not done yet, but I suspect correlation is not great… Why: I don’t know
Check annotatin at the other three functionnal levels (right know, I only look at the first one…)
Done at all 4 levels
eggnog annotation Not done
I can do a PCA to look at sample clustering based on communities
Done using phyloseq (Sept 15th samples were removed because they skew the ordinations so much…)
I can do an RDA to see how certain samples may correlate with environmental data
Not done, as environmental variables are being updated now
Look at alpha + beta diversity
alpha done: not much going on
metagenome corellated with metatranscriptome In progress: this is probably the most promising analysis
Cehck oceans metag vs . metat papers for ideas/references
fold changes for functions… maybe but not super exciting